Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart submit mode #110

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

olszowski
Copy link

Work in progress, although main concept is present. Has some bugs, need to spend a little bit more time on it.

@grzanka
Copy link
Collaborator

grzanka commented Jun 26, 2018

@olszowski can you rebase to solve the conflicts ?

@@ -1,5 +1,6 @@
import logging
import os

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid whitespace changes

@codecov
Copy link

codecov bot commented Jun 30, 2018

Codecov Report

Merging #110 into master will increase coverage by 1.85%.
The diff coverage is 83.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
+ Coverage   63.82%   65.67%   +1.85%     
==========================================
  Files          12       13       +1     
  Lines         868      976     +108     
==========================================
+ Hits          554      641      +87     
- Misses        314      335      +21
Impacted Files Coverage Δ
mcpartools/generatemc.py 85.1% <100%> (+3.05%) ⬆️
mcpartools/scheduler/slurm.py 100% <100%> (ø) ⬆️
mcpartools/scheduler/base.py 83.05% <63.63%> (-5.84%) ⬇️
mcpartools/generator.py 71.87% <65.51%> (-1.62%) ⬇️
mcpartools/scheduler/smart/slurm.py 89.7% <89.7%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 341d2ee...76935ab. Read the comment docs.

@@ -11,5 +11,6 @@ def __init__(self, options_content):
JobScheduler.__init__(self, options_content)

submit_script_template = os.path.join('data', 'submit_slurm.sh')
smart_submit_script_template = os.path.join('data', 'smart_submit_slurm.sh.j2')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .j2 suffix ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a jinja2 template, I wanted this to be explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

node_ids = []
for node in nodes_sorted:
count = int(round(node.cpu_idle * ratio))
from itertools import repeat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doing import inside the loop ?

@olszowski olszowski changed the title [WIP] Smart submit mode Smart submit mode Jul 18, 2018
def get_cluster_state_from_os():
from subprocess import check_output, STDOUT
from shlex import split
command = "sinfo --states='idle,mixed' --partition=plgrid --format='%n %P %O %T %C'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why plgrid partition is hardcoded ?

@grzanka
Copy link
Collaborator

grzanka commented Jul 25, 2018

Two things:

  • please update your fork - it is a bit behing master on main repo
  • code is crashing, see below:
# plgkongruencj at login01.pro.cyfronet.pl in ~ [22:55:12]
→ srun -A ccbmc7 --ntasks-per-node=1 --ntasks=1  --time=0:59:00 --pty /bin/bash -l         
srun: job 12331531 queued and waiting for resources
srun: job 12331531 has been allocated resources
 plgrid/tools/zsh/5.2 unloaded.
 plgrid/tools/zsh/5.2 loaded.
to make magic type:
source $PLG_GROUPS_SHARED/plggccbmc/software/setup.sh

# plgkongruencj at p0576 in ~ [22:55:19]
→ module load plgrid/tools/python-intel/3.6.2                                     
 plgrid/tools/gcc/6.4.0 loaded.
 plgrid/tools/intel/18.0.0 loaded.
 plgrid/tools/impi/2017.3 loaded.
 plgrid/libs/mkl/2017.0.3 loaded.
 plgrid/tools/python-intel/3.6.2 loaded.

# plgkongruencj at p0576 in ~ [22:55:31]
→ cd $SCRATCH

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj [22:55:34]
→ mkdir daniel

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj [22:55:36]
→ cd daniel 

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel [22:55:38]
→ git clone https://github.com/olszowski/mcpartools.git
Cloning into 'mcpartools'...
remote: Counting objects: 895, done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 895 (delta 34), reused 42 (delta 26), pack-reused 834
Receiving objects: 100% (895/895), 219.91 KiB | 0 bytes/s, done.
Resolving deltas: 100% (515/515), done.

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel [22:55:42]
→ git checkout feature/smart_submit_mode                                                                                                          
fatal: Not a git repository (or any parent up to mount point /net/scratch)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel [22:55:48]
→ cd mcpartools 

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel/mcpartools on git:master ● [22:55:52]
→ git checkout feature/smart_submit_mode
Branch feature/smart_submit_mode set up to track remote branch feature/smart_submit_mode from origin.
Switched to a new branch 'feature/smart_submit_mode'

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel/mcpartools on git:feature/smart_submit_mode ● [22:55:53]
→ versioneer install                             
versioneer.py (0.17) installed into local tree
Now running 'versioneer.py setup' to install the generated files..
 creating mcpartools/_version.py
 appending to mcpartools/__init__.py
 appending 'versioneer.py' to MANIFEST.in
 appending versionfile_source ('mcpartools/_version.py') to MANIFEST.in

# plgkongruencj at p0576 in /net/scratch/people/plgkongruencj/daniel/mcpartools on git:feature/smart_submit_mode ✖︎ [22:55:56]
→ PYTHONPATH=. python mcpartools/generatemc.py ../../spc_sh12a/ -w w -p 10000 -j 20 --smart
WARNING:mcpartools.generator:Workspace dir w doesn't exists, will be created.
INFO:mcpartools.generator:Creating directory: w
Traceback (most recent call last):
  File "mcpartools/generatemc.py", line 121, in <module>
    sys.exit(main(sys.argv[1:]))
  File "mcpartools/generatemc.py", line 115, in main
    ret_code = generator.run()
  File "/net/scratch/people/plgkongruencj/daniel/mcpartools/mcpartools/generator.py", line 163, in run
    self.generate_submit_script(smart=self.options.smart_options)
  File "/net/scratch/people/plgkongruencj/daniel/mcpartools/mcpartools/generator.py", line 219, in generate_submit_script
    cluster_state = get_cluster_state_from_os(partition=smart.partition)
  File "/net/scratch/people/plgkongruencj/daniel/mcpartools/mcpartools/scheduler/smart/slurm.py", line 100, in get_cluster_state_from_os
    cluster_info = cluster_status_from_raw_stdout(output)
  File "/net/scratch/people/plgkongruencj/daniel/mcpartools/mcpartools/scheduler/smart/slurm.py", line 82, in cluster_status_from_raw_stdout
    splitted_output = std_out.split("\n")[1:]
TypeError: a bytes-like object is required, not 'str'

@reviewpad reviewpad bot mentioned this pull request Mar 28, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants